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INVARIANT BAYESIAN ESTIMATION ON MANIFOLDS 

By Ian H. Jermyn 
INRIA 

A frequent and well-founded criticism of the maximum a posteri- 
ori (MAP) and minimum mean squared error (MMSE) estimates of 
a continuous parameter 7 taking values in a differentiable manifold F 
is that they are not invariant to arbitrary "reparameterizations" of F. 
This paper clarifies the issues surrounding this problem, by pointing 
out the difference between coordinate invariance, which is a sine qua 
non for a mathematically well-defined problem, and diffeomorphism 
invariance, which is a substantial issue, and then provides a solution. 
We first show that the presence of a metric structure on F can be 
used to define coordinate-invariant MAP and MMSE estimates, and 
we argue that this is the natural way to proceed. We then discuss the 
choice of a metric structure on F. By imposing an invariance crite- 
rion natural within a Bayesian framework, we show that this choice 
is essentially unique. It does not necessarily correspond to a choice 
of coordinates. In cases of complete prior ignorance, when Jeffreys' 
prior is used, the invariant MAP estimate reduces to the maximum 
likelihood estimate. The invariant MAP estimate coincides with the 
minimum message length (MML) estimate, but no discretization or 
approximation is used in its derivation. 



1. Introduction. Statistical estimation is a very old field, but despite 
that many questions remain unanswered and debates about the best way to 
proceed are plentiful. From a probabilistic point of view, all the information 
about a quantity of interest taking values in a space F is contained in a 
probability measure on F. If it is deemed necessary to single out a particular 
point 7 € F for some purpose, a loss function L : F x F — > M : (7, 7') ^(7, 7') 
is defined describing the cost inherent in taking the true value of the quantity 
to be 7 when it is in fact 7'. The mean value of the loss as a function of 
7 can be computed using the probability measure, whereupon one can, for 
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example, choose that point 7 G F that minimizes the mean loss as one's 
estimate of the true value of 7. 

In some cases, especially those closely linked to a specific application, 
the loss function will be fully dictated by circumstance. In this case, the 
invariance issues discussed in this paper do not arise. However, in many other 
cases, and for the purposes of theoretical analysis, estimates are needed in 
the absence of any clear knowledge of what the real loss is. Indeed, there may 
not be a "real loss." In these cases, generic loss functions are required, and 
indeed are currently widely used, in both theory and practice. These generic 
loss functions should satisfy two criteria: they must be well-defined, and they 
must not introduce implicit bias that is not present in the models. The latter 
is best expressed by saying that in the absence of prior knowledge about the 
loss function, the loss function should not introduce prior knowledge about 
the parameters to be estimated. This is an application of the principle that 
if two people have the same knowledge, then they should make the same 
inferences. 

In the case that F is a differentiable manifold, difficulties arise. Two pop- 
ular choices of generic loss function are the negative of a delta function 
and the squared difference of coordinates, leading to maximum a posteriori 
(MAP) and minimum mean squared error (MMSE) estimates, respectively. 
In order for these quantities to be well defined, two things are necessary: 
an underlying measure in order to define the delta function loss, and a dis- 
tance function in order to define the squared error. The existence of these 
quantities is normally ignored, or equivalently they are assumed to take on 
particular forms. The resulting loss functions are not coordinate- invariant, 
and hence are ill-defined in general coordinate systems, thus violating the 
first criterion. This lack of coordinate invariance leads to the paradox that 
two people with the same knowledge can construct different estimates simply 
by choosing to use different coordinate systems, for example, polar rather 
than rectangular. Even if the definitions are made coordinate-invariant, and 
hence well-defined, the resulting loss functions still violate the second crite- 
rion in general. The estimates are not invariant to diffeomorphisms, which 
"mix up" the points of T ( "reparametrizations" ) , and therefore necessarily 
introduce extra information about these points. 

The purpose of this paper is to correct the above situation. We define 
compatible, coordinate-invariant MAP and MMSE estimates by introducing 
a Riemannian metric on F, and argue that this is the natural way to achieve 
such invariance. This satisfies the first criterion. The introduction of a metric 
raises the question of how to choose this extra structure, and we argue that 
in the case of Bayesian estimation, imposing the second criterion renders 
the choice of metric unique. 

The main results of the paper as regards Bayesian estimation are the 
following: 
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(a) The metric on F should be the pullback by the model function of the 
natural metric on every measure space. 

(b) Invariant MAP estimates should be defined using the density of the 
posterior measure with respect to the measure derived from this metric. 

(c) Invariant MMSE estimates should be defined by using, in place of the 
squared error, the squared geodesic distance based on this metric. 

(d) In conditions of "complete ignorance," that is, conditions in which the 
prior probability measure is Jeffreys' prior, MAP estimates always re- 
duce to maximum likelihood (ML) estimates, in contrast to much Bayesian 
argument and practice. 

(e) The invariant MAP estimate coincides with the MML estimate de- 
scribed by Wallace and Freeman (1987), except that no discretization 
of F is required and no approximations are made. 

The rest of the paper is structured thus. In Section 2 we discuss the failure 
of invariance for MAP estimation on manifolds and its causes. In Section 3 
we describe how both this problem, and the related failure of invariance for 
MMSE estimates, can be solved by endowing the manifold with a metric 
structure, and we argue that this is the natural solution to the problem. 
In Section 4 we discuss the choice of metric structure, and use a simple 
invariance argument to render this choice unique. In Section 5 we discuss 
the conclusions of the report and related work. 

The material on the differential geometry of measure spaces and its con- 
nection to Jeffreys' prior may be known to geometrically minded statisti- 
cians. We include it here for completeness, and to emphasize its coordinate- 
invariant nature. 

2. The problem. To illustrate the problem, we examine the maximiza- 
tion of a probability density function (p.d.f.) on a manifold of dimension 
m. Let the manifold be F, a point in F being denoted 7. We are given a 
probability measure Q on F, which we may view as the posterior in an 
MAP estimation task, although this is not important at this stage. We are 
also given two systems of coordinates on F, : F — > and </) : F — > M™. (We 
ignore questions of topology that might require us to use more than one 
coordinate patch; the issue is not central to the discussion here.) 

Expressed in terms of the first set of coordinates 9, and the corresponding 
measure d'^6{'y) on F, we find Q = Qg{0{'y)) d'^6{j), where Qei^ij)) is a 
function. We now separate the function Qg from the measure and find the 
argument of its maximum value ^max £ W^, giving an estimate of 7, 70 = 

^""^(^'max)- 

We may choose to express Q in another coordinate system, 0:F— ^R'". 
Using the measure defined by this coordinate system, we find that Q = 
Q<t>{4>{'l)) ^ d'^(j){-y). If we now follow the same procedure as before, and find 
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the argument of the maximum value of Q^f,^ 4'ma.x, we find another estimate, 

% = 0~"^(0max)- 

The problem is the following. Suppose that the two coordinate systems 
are related by a function a : M™ so that 6* (7) = a{(j){'y)). In this case, 

the measures with respect to the two coordinate systems are related by 
dP^Oij) = J[a]((/)(7)) (^"^0(7), where J[a](0(7)) is the Jacobian of the coor- 
dinate transformation. This in turn means that the functions Qq and Q^f, 
are related by Q^{(t>{l)) = Qe{0{l))J[a]{a-^{e{^))). 

The consequence is that the estimates obtained by maximizing Qq and 
Q(fi are different, due to the presence of the Jacobian factor. Apparently our 
estimate of 7 depends upon the choice of coordinates, or in effect upon the 
whim of the person making the estimate. This may seem surprising: one 
thinks of the question "What is the most probable point in F?" and, by 
analogy with the discrete case, one expects an invariant answer. 

The difference between the continuous and the discrete cases means, how- 
ever, that the question being asked in the continuous case is not the pre- 
viously cited one at all, but a slightly more complicated version. Given a 
coordinate system, 0, the question being asked is, "What is the infinitesi- 
mal volume element 9~^{dz) in F (where dz is an infinitesimal coordinate 
volume in M™) that is most likely to contain the true point in F?" (We use 
the notation both for the inverse of a map f :A^ B, : B ^ A, and 
for the pullback /"^ : 2^ ^ 2^ : 5 D y {a G ^ : /(a) G Y}. Context serves 
to distinguish the two usages.) Using a different coordinate system, (j) on 
the other hand, the question is "What is the infinitesimal volume element 
(j)~^{dz) that is most likely to contain the true point in F?" In general, 
0~^{dz) 7^ (l)~^{dz). It is then clear that different answers are to be expected 
using different coordinate systems, because the question being asked is dif- 
ferent in each case. 

A simple example of the above is provided by a Gaussian measure in two 
dimensions with zero mean and covariance the identity. This measure can 
be expressed in rectangular or in polar coordinates: 

Pr(f) = dx dy Z-^e-(^'+^') = dr dO Z~ Ve"""". 

In the first case, the maximum density procedure leads to x = y = 0, while 
in the second it leads to f = l/\/2 and an indeterminate value for 6. In 
this simple case, one can see the error clearly, but in more complex or less 
intuitive cases the same phenomenon arises and passes unnoticed. 

From a measure-theoretic point of view, what is happening is clear. The 
functions Qq and Q^ are probability density functions. Any p.d.f. is defined 
with respect to an underlying measure. The Radon-Nikodym derivative of 
the probability measure with respect to the underlying measure then gives 
the p.d.f. In the scenario just described, two different underlying measures 
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are being used: (F^6{'y) and (i™0(7). To expect them to yield the same results 
is unreasonable. 

If one concentrates on the underlying measure, then there is no problem. 
In terms of 6, the underlying measure is d'^9{'j), while in terms of cp, the 
same underlying measure is J[a]{(j){-~f)) d"^ <j){'y) . Integration of either of these 
over a fixed subset of T will produce the same result: they are the same 
measure. Using this fixed measure, the problem disappears: in terms of (p, 
the p.d.f. with respect to the underlying measure is (0(0(7))) = QeiGil))- 
The maxima of (0(^(7))) with respect to 4> agree completely with those of 
Qe{0{j)) with respect to 6*, in the sense that 

^max — Q^('^max)j which implies 
that ^"""^(^max) = <^~^(0max)- The points in T that we find are the same. The 
problem is that, given an arbitrary coordinate system, we do not know which 
choice of coordinate is "correct," and hence what the estimate should be. By 
effectively focusing on measures on M™", the coordinate space, rather than 
on underlying measures on F, the problem is created. How then to define, 
in a coordinate-invariant way, an underlying measure with respect to which 
to take the Radon-Nikodym derivative? 

A similar situation arises with respect to MMSE estimates, which also lack 
invariance under general changes of coordinates. It is equally true that the 
mean itself has no coordinate-invariant meaning, and for the same reasons. 
In calculating both the error and the mean, one is faced with adding or sub- 
tracting certain values. If these operations are performed on the coordinate 
values in a particular coordinate system, they will change with a change of 
coordinates. Equally, one cannot add or subtract points of T directly; such 
operations are not defined unless T possesses an algebraic structure of some 
kind, for example, is a vector space. 

In practice, what is crucial to the MMSE estimate is the notion of a 
distance between two points in E. If a global Euclidean coordinate system 
exists, this is given by the squared error, but in general this is not the case. 
If we wish to consider MMSE estimates in general coordinate systems, we 
must be able to define distances in a coordinate-invariant manner. 

3. Coordinate-invariant estimates. If one wishes to discuss measures and 
distances using an arbitrary set of coordinates, one must express the math- 
ematics in a way that allows for this eventuality. Not to do so means 
that symbols such as are not defined. The natural way to express 
both geometric and measure-theoretic information about manifolds in a 
way that is manifestly free of coordinates, but that nevertheless allows 
the derivation of an expression in terms of an arbitrary coordinate sys- 
tem with the greatest of ease, is the language of forms. Readers not fa- 
miliar with this language may wish to look at the Appendix, where we 
provide a brief introduction to forms and their uses, or at the book by 
Choquet-Bruhat, DeWitt-Morette and Dillard-Bleick (1977). 
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We are interested in probability measures. These can be integrated over 
m-chains, for example, the whole manifold T, and as such are m-forms. In 
addition, they must be positive and normalized, so that they are probability 
m-forms. The answer to the first of the questions at the end of the last 
section is then: define an ?7T,-form, since these are, by definition, coordinate- 
invariant. The answer to the second question would seem to be: define a 
distance function. In practice, the following considerations push us strongly 
in one direction: the introduction of a Riemannian metric on the manifold 

r. 

First, the introduction of a metric allows us simultaneously to answer 
both of the questions posed at the end of the last section. Starting from the 
metric, we can derive an m-form and use this as the underlying measure. 
We can also define a distance function, as the geodesic distance between two 
points. 

Second, if we are to introduce notions both of "volume" (via an underlying 
measure) and of "length" (via a distance function), it is sensible that these 
notions be compatible. Otherwise there is no reason to believe that the 
resulting estimates will bear any relation to one another. The use of a metric 
to define both the underlying measure and the distance function ensures that 
maps that preserve lengths preserve volumes also, or, even more intuitively, 
that the volume of a small cube is given by the product of the lengths of its 
sides. 

The final consideration is intuition in practice. Manifolds with a measure 
but no metric are strange objects. They do not correspond to our intu- 
ition of a surface or volume at all. The space of volume-preserving diffeo- 
morphisms is much larger than the space of isometrics, and allows severe 
distortions. An example is the mixing of two incompressible immiscible flu- 
ids. The initial "drop of oil in water" may end up smoothly distorted into 
dramatically different shapes. The parameter spaces that we consider intu- 
itively possess "metric-like" properties, even if these are not well defined. 
For a one-dimensional F, for example, the numbers that represent different 
parameter values indicate something more than the topological, although a 
precise interpretation may not be available. If we wish to be able to describe 
these geometric properties of the manifold as well as its measure-theoretic 
properties, a metric is necessary. In addition, it is quite hard to write down 
an expression for a measure on a manifold without implicitly assuming a 
metric. In practice, this means that metrics appear, albeit disguised, in the 
expressions for many probability measures. Gaussian measures are one ex- 
ample, where an inner product is used to define the exponent. An inner 
product on a vector space is equivalent to a constant metric, which allows 
identification of each tangent space with the vector space itself. In many 
other cases, the assumption of a Euclidean metric is made manifest by the 
appearance of an orthogonal inner product. 
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What then is a Riemannian metric and how does it define a measure? A 
metric h is the assignment, to each point 7 of T, of an inner product on 
the tangent space T^F at 7. This is detailed in the Appendix, where it is 
further explained how the existence of a metric allows us to map functions to 
m-forms using the Hodge star. Given a function / on F, we can thus create 
an m-form, that is, a measure *h/- The choice of function / is dictated 
by compatibility between the measure-theoretic and geometric aspects of 
the manifold. By choosing / to be I, the function identically equal to 1, 
the resulting m-form is preserved by isometrics; in other words, maps that 
preserve length preserve volume also. 

Being a form, the quantity Uh = is invariantly defined. This is clear 
first because no coordinate system was used in its construction, but it can 
also be verified in detail. As described in the Appendix, the expression for 
this form in the coordinate basis of coordinates 6 is 

Uh = *hl=|h|e/^(i™e, 

where |h|g is the determinant of the metric components in the 9 coordinate 
basis, and is the coordinate basis element for the space of m-forms. To 
see the invariance of this measure explicitly, note that a change of coordi- 
nates a introduces a factor of J[a]{(j){'y)) from cf^O, while the transformation 
of the determinant of the metric matrix elements from one basis to another 
introduces a factor of J[a]{4>{'y))~^ . Thus, expressed in any coordinate sys- 

I 1 1 /2 I 1 1 /2 

tem, the form of the measure is identical: |h|g' dP^O = \h\^ d^(j). To stress 
the point once again: the measure d^Olj) has no coordinate-invariant mean- 
ing. If we try to express a measure in a general coordinate system in this 
way, we literally do not know what we are talking about. 

3.1. Maximum density estimates. Given a probability m-form Q, and 
another positive m-form U, one defines the p.d.f. of Q with respect to U 
by division: 

(3.1) Q = §. 

This is the equivalent of the Radon-Nikodym derivative in the language of 
forms. What now becomes of maximum density estimation? We simply have 
to use Uh in (3.1). If we choose a particular coordinate system 6, so that 

Q = Qe d'^e and Uh = |h|g/^ d'^O, then we have 

(3.2) Q = \h\,'/^Qe. 

The left-hand side of this equation is invariant to changes in coordinates. 
These will produce equal Jacobian factors in both the numerator and the 
denominator of (3.2), which will thus cancel out. Note also that this p.d.f. 
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does not result simply from a choice of coordinates. Although it may be pos- 
sible to find a system of coordinates in which the determinant of the metric 
is constant, this is misleading in two ways. First, what is really happening is 
that a metric is being chosen. The naive approach really means choosing a 
metric whose determinant is constant in the coordinate system you already 
have, which is not a coordinate-invariant procedure. Second, in more than 
one dimension, although the determinant of the metric may be constant, it 
may not be possible to find a system of coordinates in which the metric itself 
is constant. This would imply that the manifold is flat, a statement that is 
coordinate-invariant and may not be true. 

3.1.1. Expression in terms of a delta function loss. Usually the maxi- 
mum density estimate is regarded as derived from the use of a particular loss 
function, 6{6{'j),9('j')) on T. Given a probability m-form expressed in terms 
of 6, Qe{9) d"^9, this leads to the familiar recipe = {argmaxg Qg{9)), in 
apparent contradiction to the previous discussion. From this point of view, 
there is no need to define a p.d.f. at all, since we were merely integrating 
with respect to the probability measure. What is going on? 

The answer of course involves the same concepts as above. The quantity 
6{9{'~f),9{'~f')) is not invariantly defined, since the measure against which to 
integrate it has not been given. In our context, the delta function (in fact 
there are effectively m of them) is best viewed as the identity map from 
A^r, the space of p-forms on F, to itself. As such, it is a p-form at its first 
argument (a point in F) and an (m— p)-form at its second argument (another 
point in F). It can thus be integrated against a p-form to produce another 
p-form. When p = 0, we recover the usual delta function that evaluates a 
function at its first argument. In our case, however, we wish to integrate 
the delta function against an m-form, and thus p = m. The delta function is 
thus an m-form at its first argument and a 0-form, or function, at its second 
argument. The result of integrating it against the posterior measure is thus 
an m-form, and to create a function that we can maximize, we need to use 

1—1/2 

the Hodge star. This again introduces the factor of |h|g that we see in 
(3.2) and that is implicit in (3.1). 

An alternative point of view is to consider the delta function as a map 
from A^F to A^"^~'p^T, making it an (m — p)-form at its first argument and 
a p-form at its second argument. In order to integrate this against a p-form, 
we can use the inner product on A^ described in (A. 2) of the Appendix. In 
our case, this point of view makes the delta function a 0-form (function) at 
its first argument and an m-form at its second. The result of the integration 
is thus a function as required for maximization, but now we find that the 

1—1/2 

use of the inner product has already introduced the factor of \h\g , thus 
giving the same result as in the other two methods. 

There is thus no confiict between these different ways of speaking. 
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3.2. MMSE estimates. Suppose we are given a distance function. That is, 
we are given a symmetric map d:T xT ^ M"*" , obeying the triangle inequahty 
and such that ^(7,7) = 0. Given a point 7, we define the function 

d,{^') = d{^,i). 

We can now define the coordinate-invariant form of the mean squared error, 
which we will call the mean squared distance, as 

(3.3) L{j) = J^{d^fQ, 

where Q is as usual a probability m-form. In terms of a particular coordinate 
system 9 onT, one has 

m= [ d^e'Qe{0')dl{e,e'), 

where dg is the expression for the length in terms of the given coordinates. 

Having defined the mean squared distance L, we can now define the min- 
imum mean squared distance (MMSD) estimate as the set of minimizers of 

All that remains is to use the metric to define a distance function that 
we can use in (3.3). Below we recap this material from differential geometry, 
phrasing it in a manifestly coordinate-invariant way, and emphasizing the 
difference between coordinate invariance and invariance to diffeomorphisms, 
which is a coordinate-invariant and therefore content-full concept. We first 
define the notion of the length of a path, and then define the distance be- 
tween two points as the length of a minimum length path between them. 

Let / be an interval of the real line, considered as a manifold (i.e., without 
the structure of a field). Let po and pi be the elements of its boundary. Let 
TT : / r be an embedding of I in L such that vr(po) = 7 and vr(pi) = 7'. To 
define the length of the path (i.e., its volume), we need a 1-form on /, or in 
other words a measure, which we will then integrate over I. Now, however, 
we have an invariance criterion: we must ensure that the length we calculate 
depends only on the image of I in F, and not on the precise mapping of points 
of / to points of r. This amounts to saying that replacing vr by vre, where e 
is an arbitrary boundary-preserving diffeomorphism, should not change the 
resulting length. Note that unlike coordinate invariance on /, which follows 
as soon as we integrate over the coordinates, this condition is a substantive 
one. As argued in the Appendix, the only way to ensure this is to construct 
a metric on / by pulling back a metric from T, and then using this metric 
in the normal way to construct a 1-form. We thus pull back the metric h on 
r to give a metric 7r*h on /. We then use the Hodge star of this metric to 
map I to a 1-form that can be integrated on I. In notation, 

(3.4) Ktt) = J^*n*hl- 
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To illustrate the ability to derive an expression in an arbitrary coordinate 
system from the coordinate-invariant expression (3.4), we introduce a co- 
ordinate system t : / — > M on / with a corresponding coordinate basis given 
by ■§i{p), and a coordinate system 6* on F with a corresponding coordinate 
basis given by ^(7)- In these bases, the (single) component of the pulled 
back metric can be found to be 

(1^^)' ^t^p^) = (^(^)^("(^))' ^(^)^("(^))) 

where hij are the components of the metric h in the 9 coordinate system. 
Thus the result is simply the length of the tangent vector to the path vr in 
the metric h. Rewriting (3.4) in terms of this expression, we find that 

/(vr)=y^ rft(/..,),,^(t)^(t)) , 

where we have abused notation by using the same symbol vr for the map 
from / to r and its expression in terms of coordinates. The points a € M and 
6 G M are the coordinate values of po and pi, respectively. 

Given the length of a path, we can now define the distance between two 
points as 

dj{j')=d{j,-f') = min Z(7r), 

7rGn(7,7') 

where 11(7,7') is the space of paths with endpoints 7 and 7'. This distance 
is coordinate-invariant, and can be used in (3.3). For a general metric it is 
of course hard to derive an analytic expression for d. 

In the case that the metric is Euclidean, L reduces to the mean squared 
error, as it should. The resulting MMSD estimate is then the mean, that 
is, the MMSE estimate, and is unique. In other cases, the MMSD estimate 
provides a generalized mean, known as the "Karcher mean," first introduced 
by Karcher (1977) as the center of mass on a Riemannian manifold. It is a set 
of points in F, each of which minimizes the mean squared distance to every 
other point of F. Note that the set of minimizers may contain more than 
one point of F. This does not present a problem as such. It simply means 
that from the point of view of the mean squared distance loss function these 
points are equivalent. 

4. Bayesian estimation and the choice of metric. We have argued that 
in order to define coordinate-invariant and consistent maximum density and 
MMSE estimates, one should use a metric on the manifold F. We now turn 
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to the question that we have been conspicuously avoiding. How is one to 
choose a metric on F? 

Thus far, we have been dealing solely with a manifold T and a proba- 
bility measure Q on this manifold. In this abstract situation, it seems that 
the above question has no good answer, which is unsurprising. We turn 
now, however, to the case that is usually of interest: when Q is a posterior 
probability measure derived from a model function and a prior using Bayes' 
theorem. 

We introduce the data space, X . We assume that this has sufficient struc- 
ture to allow the following constructions, and in practice it can be supposed 
to be either a countable set or a manifold. On X one can define the space 
of measures Ai{X). The space of probability measures, S{X), is a proper 
subset of the cone of positive measures. This set has a complicated boundary 
even in the case where X is countable; when X is not countable, there are 
also measures with singular components, which complicate things still fur- 
ther. We avoid these difficulties by assuming that all measures with which we 
will deal lie in the interior oi M{X) and, where appropriate, are nonsingular. 

We are free to choose coordinates on M{X) as on any manifold. One 
choice is to describe measures as n-forms, in which case the space S{X) 
becomes the space of probability n-forms. A model function is a map A : F — > 
M{X) associating to each point 7 € F a (probability) measure on X. We 
will assume that this map is a regular embedding, so that the image of F 
with the differentiable structure induced by A is a submanifold of S{X). 

4.1. An invariance criterion. We now use this extra structure, which is 
present in any real estimation problem, to argue for a unique choice of metric 
on F. The argument rests on one simple idea: that all information about the 
parameters not contained in the data be contained in the prior measure, 
or in other words, that all information that distinguishes one point of F 
from another should come either from their correspondences with probability 
measures on X (condition 1) or from the prior measure on F (condition 2). 
It is the probability measures on X alone that determine the relationship 
between the points in F and the observations represented by points in X, 
and the way that these measures are parameterized serves to determine 
the meaning of the points in F and not the other way around. Any other 
information in addition to the data we have at hand should be described by 
the prior. Any metric that we choose on F should respect this principle, and 
not introduce any extra information about points in F. This is the second 
criterion. 

The fact that it is not the identity of individual points in F that is im- 
portant, but merely their correspondence with probability measures on X, 
means that it is only the image of F in M{X) that counts. This image is 
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invariant under the replacement of A by Ae, where e : F — > F is a diffeomor- 
phism. A model function is thus an equivalence class of maps {Ae}. The 
conclusion from condition 1 is thus that inference should be invariant under 
the replacement of A by Ae, where invariant means that the image of the 
estimate by the model function is preserved. This diffeomorphism invari- 
ance, although superficially similar to a change of coordinates, is defined 
independently of any change of coordinates, and as such is a substantive 
restriction. 

There are only two ways to achieve this aim. One is to pick a particular 
representative of the equivalence class of maps {Ae} and to define a metric 
on the corresponding copy of F. This metric can then be pulled back to 
other members of the equivalence class using the maps e. Although this will 
satisfy condition 1, the selection of a particular member of the equivalence 
class to be endowed with a particular metric implies that we already know 
something about the points in F independently of their correspondence with 
probability measures on X. Otherwise, how could we know to which points of 
F to assign which values of the metric? This is exactly the type of information 
that should be included in the prior, and thus the procedure described in 
this paragraph violates condition 2. 

The second approach is to pull back a metric from M{X) to each equiv- 
alent copy of F using Ae. [Since an embedding is a full rank immersion, the 
pulled back metric will be a proper Riemannian structure on F if ^A{X) 
is a proper Riemannian manifold.] Such a metric automatically satisfies the 
consistency conditions introduced by the maps e between members of the 
equivalence class: Ae*g = e*A*g, where g is a metric on M{X), and thus our 
results will depend solely on the image of F in M{X). In addition, we were 
not required to pick a particular member of the class a priori, since each 
member of the equivalence class gets its own consistent metric induced by 
its own model function. Thus both condition 1 and condition 2 are satisfied. 

We are thus in a position to define a metric and underlying m-form on F 
that satisfies the invariance criterion stated at the beginning of this section 
by pulling back a metric from M[X). We lack only one thing: a metric on 
M{X) to pull back. 

4.2. Metrics on M{X). The first thing we must do is to define what 
we mean by the tangent space to M{X). Since we are using n- forms as 
coordinates on M[X), and since the space of signed measures is linear, it 
is easy to see that a tangent vector to M{X) can be identified with an n- 
form. If we restrict attention to this n-form must integrate to zero 

to preserve normalization. Then, at a point T £ A^(X), an inner product 
between two tangent vectors vi and V2 is given by 
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where we have identified the abstract tangent vectors v with their expression 
as n-forms. Note that the divisions are weh defined because T is positive. 
The justifications for this choice as the only reasonable metric on M{X) are 
many, and we do not reiterate them here. Interested readers can consult, for 
example, the book by Amari (1985). 

4.3. Fullback to T. Using the embedding A of F in Ai{X), we can pull 
the metric on Ai{X) back to F. The definition of the pullback of the metric 
acting on two tangent vectors u and v in T^F is as before 



where A* rT^F — > r^(^)A^(X) is the tangent (derivative) map. This expres- 
sion is coordinate- invariant. If we wish to know the matrix elements of 
hA = A*g in the basis determined by a system of coordinates, on F, 
we must evaluate h\ on these basis elements. The result is 



where we denote by Ag the value of the model function at the point 7 with 
coordinates 6. We thus find the known result that the components of the 
induced metric form the Fisher information matrix. 

As described in Section 3, the coordinate- invariant measure on F is then 
given by 



4.4. MAP estimates. MAP estimation is now simply a question of using 
(3.1) with Q equal to the posterior measure from Bayes' theorem, and U 
equal to Ua. 

Note that the introduction of a prior probability prevents the estimate 
from being invariant under replacement of A by Ae. The solution to this 
problem is the following. The prior probability is assigned to one member 
of the equivalence class {Ae} based on knowledge of the parameters that is 
independent of current data. It can then be pushed forward to other copies 
of F using e~^. Note that this violates condition 2 as it should, but that it 
does not violate condition 1. 

In cases of "complete ignorance" of the value of 7, Jeffreys' prior is often 
used as the prior probability measure. In this case, the prior measure and 
the underlying measure cancel in the invariant MAP estimate, leaving only 
the model function. In cases of "complete ignorance," then MAP estima- 
tion reduces to maximum likelihood estimation regardless of the nature of 
Jeffreys' prior. (Note that the posterior probability measure still contains 
Jeffreys' prior; it is in the MAP estimate itself that it disappears.) 



hA{u,v) = {A*g)^{u,v) =gA(T,)(A*(n)A^(u)) 




UA=*hAl=|hAir^"^- 
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Thus while traditional MAP estimates of the variance of a Gaussian mea- 
sure, for example, vary with the parameterization, the invariant MAP esti- 
mate will produce the maximum likelihood result in every case. The data 
space is X = M", corresponding to n independent experiments, and the 
model function is a Gaussian family of product measures on n, for the sake 
of argument with zero mean. The parameter space T is isomorphic to 
we use coordinates a S M on this space, where a is the standard deviation. 
The model function A is then given by 

A. = d"x(2vra2)-"/2exp|-^|^}, 

where (•,•) denotes the Euclidean inner product on M". Derivation of the 
Fisher information then shows that the inner product between tangent vec- 
tors u and V in T^F, where the point 7 has coordinate a, is 

2n 

(4.1) hA(^x,^;) = ^n%^ 

where the superscript a denotes the component with respect to the co- 
ordinate basis The induced measure is thus proportional to da/a, the 
well-known Jeffreys' prior. Let us now consider the parameterization v = a", 
for a G N. Jeffreys' prior is equal to dv/v for all q 7^ 0. The traditional MAP 
estimates derived from these different parameterizations are 

f,2/a _ (^'^) 

n + a 

where we have raised the estimate of v to the power of 2/ a to make it 
equivalent to an estimate of cr^. The problem of lack of invariance comes 
sharply into focus in this example. Which estimate of a is to be used? 
On the other hand, the invariant MAP estimate is 

n 

for all a. 

4.5. MMSD estimates. In Section 3.2 we defined a coordinate-invariant 
version of the mean squared error estimate, which we called the MMSD 
estimate. Having defined a metric on F above, we can now use it to calculate 
distances in F, and hence to define the MMSD estimate. In general, this is 
a difficult task that is not tractable analytically, although approximations 
may be available. In simple examples, however, one can compute the distance 
function ^(7,7') analytically. We give an example in Section 4.5.1. 
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4.5.1. MMSD estimate of variance. Consider the same example as above, 
of the estimation of the variance of a zero mean Gaussian measure. 

From (4.1), the infinitesimal distance ds between the points with coordi- 
nates a and a + da is given by 

ds^ = da"^ . 

This is easily integrated to give the distance between two points with 
coordinates (Tq and cJi (assume ai > do): 

d{ao,ai) = \/2nln( — 

The MMSD estimate of a is therefore given by considering the following 
mean loss under the posterior measure Q for a: 

L{a) = V2^ da'Q{a'){lna-\na'f. 
Jo 

Differentiation with respect to a then shows that the minimum squared 
distance estimate of a, a, is given by 

a = exp Eq [In a] , 

where Eq[-] indicates expectation using the measure Q. Note that E'Qpncj] ^ 
lnEQ[a] in general and that therefore the estimate is not simply the mean 
of a as would have been obtained by assuming a Euclidean metric. 

The mean of In a can be calculated in the case that the prior on a is taken 
to be Jeffreys' prior. It is given in terms of coordinates by 

SQ[lncT] = i[ln(l(x,x))-V(H]' 
where ^/J is the function 

V^(z) = ^lnr(z) 
and r is the Gamma function T{z) = dtt^~^e~^ . Thus 



tj- ^/ (^'^) g-(l/2)i^(n/2)_ 

For large z, ip{z) = ln(z), so that the estimate becomes 



the classical result. To the next order, — ln(z) — This introduces 
corrections to the classical result: 

This formula is valid within about 10% down to n = 1, at which point the 
invariant result is bigger than the classical result by a factor of 1.9. 
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4.5.2. General case in one dimension. The form of the above estimate 
is quite general in the one-dimensional case. Consider that we have derived 
the metric on F, h. The distance between two points 70 and 71 is then given 
according to the general discussion in Section 3. In a general coordinate 
system, 6, this can be written 

d(7,7') dti^h{7r{t))^^{t)yY' = d9h'/\9), 

where vr(to,i) = 7o,i) ^0,1 = ^(70,1) and h is the (single) component of the met- 
ric h in the 6 coordinate system. Note that there is no need for a minimiza- 
tion in one dimension. All paths with the same endpoints belong to the same 
equivalence class under the action of (boundary- and orientation-preserving) 
diffeomorphisms of /. Now let H be the inverse derivative of /i^/^. The 
(signed) distance between the two points is now d{6i,9o) = H{9i) — H{6q). 
Including this in (3.3), differentiating L and equating to zero then gives the 
result that 

H{e) = Eci[Hl 

and thus that 

In more than one dimension, of course, the problem is a great deal more 
complicated, since there is an infinity of equivalence classes, and the mini- 
mization means solving a partial differential equation for the geodesies. 

5. Discussion and related work. There is a significant amount of work 
on the geometry of probability measure spaces from the point of view of 
classical statistics; Murray and Rice (1993) and Kass and Vos (1997) pro- 
vide recent treatments. As interesting as this work is, it has focused on 
asymptotics and other issues of importance to classical statistics, while the 
Bayesian approach using prior and posterior probabilities and loss functions 
has largely been ignored. As a consequence, it is not directly relevant to the 
problem posed in this paper. For example, Murray and Rice (1993) assert 
that the Riemannian distance is not of statistical significance, although they 
give no arguments, and that the mean in a manifold cannot be calculated; 
all that is possible is an analysis of the way in which the value of the mean, 
calculated in coordinates, changes with the coordinates. As we have seen, 
however, the Riemannian metric precisely allows the definition of a natural, 
coordinate-invariant generalization of the mean. 

The pulled back metric defined in Section 4.3 was first introduced by Rao 
(1945), but it was the work of Amari (1985) that brought these ideas to 
prominence. Amari (1985) introduced, in addition to the metric, a family 
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of connections on T, one of which was the metric connection compatible 
with the metric. The nonmetric connections, however, cannot be used to de- 
fine the structures necessary for invariant Bayesian estimation as described 
here. Efron and Hinkley (1978) and Barndorff-Nielsen (1987) introduce "ob- 
served" geometric structures, but again these do not enable the definition 
of invariant estimates satisfying the two criteria in this paper. For exam- 
ple, the observed Fisher information metric of Efron and Hinkley (1978) is 
not a tensor, and thus violates the first criterion. In addition, it requires 
the definition of an underlying measure on the data space X; estimation is 
not invariant to this choice. Critchley, Marriott and Salmon (1994) develop 
"preferred point geometry" to try to ameliorate the lack of naturality they 
perceive in previous geometric approaches to statistics. The "preferred point 
metric" they define is, however, not invariant to diffeomorphisms, precisely 
because there is a preferred point. It thus violates the second criterion. 

There is, from a Bayesian point of view, a more general objection to 
the asymmetric or preferred point structures (many of which also violate 
the triangle inequality) used in much of the above work. This objection is 
essentially the same as the original motivation for introducing them, which is 
the notion that there is a "true distribution" that must be treated differently, 
and related problems, for example, the worry that this distribution might 
not lie in the image of F. This notion does not exist, and indeed does not 
make sense, in a Bayesian approach. This can be seen by using, for example, 
a preferred point metric in the formula for the posterior density, (3.1). The 
preferred point is undefined, yet if it is taken to be the argument to the 
posterior density, seemingly the only reasonable choice, then the "preferred 
point" vanishes and we are back to the Riemannian metric described herein. 
Thus the raison d'etre of these more complex structures disappears. 

From another direction, Pennec (1999) develops some basic statistical 
tools for Riemannian manifolds, and applies these ideas in various ways to 
problems in computer vision. The approach is not Bayesian, however, and in 
particular the choice of a metric and the relation with estimation problems, 
including the use of the metric measure as an underlying measure for MAP 
estimation, are not considered. 

MML inference was developed by Wallace and Boulton (1968) and Wallace and Freeman 
(1987). A discussion of its relationship with the standard Bayesian approach 
and of its invariance properties can be found in the above papers and in 
the paper by Oliver and Baxter (1995). The literature on MML inference 
frequently cites the invariance of MML estimates as one reason to prefer 
them to MAP estimates. The above analysis shows that this is not a special 
property of MML estimates, or a deep problem with MAP estimates. In- 
deed, the issue is not one of MAP estimation per se. Lack of invariance is a 
consequence of not describing the quantities of interest in F in a coordinate- 
invariant, and hence meaningful, way. To do this, one must recognize that a 
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metric is lurking in the definition of botli MAP and MMSE estimates, and 
indeed in any useful discussion of F, and that making it explicit is a nec- 
essary condition for meaningful definitions in arbitrary coordinate systems. 
Once done, the definition of coordinate-invariant estimates is an immediate 
consequence of the geometry. Although (3.1) with the pulled back metric as 
underlying measure is formally the same as that for MML estimates, unlike 
MML methods, no discretization of T is needed, and no approximations are 
made. In fact, the above derivation throws light on the procedure used in 
deriving MML estimates, which from this point of view appears to be a 
roundabout way of defining an underlying measure by first discretizing the 
manifold and then considering the volume of each cell. 

The fact that we are discussing the geometry of F and not a particular 
form of estimate means that the analysis presented here is more general 
than MML, however. By recognizing the necessity of an explicit metric on 
F for inference, the way is open for the definition of coordinate-invariant 
loss functions of many different types. Here we have given the example of 
a coordinate-invariant MMSE estimate, the MMSD estimate, but whenever 
defining a loss function on a parameter space, the issues described here must 
be taken into account. 

5.1. Discussion of choice of metric. In Section 4 we came to the con- 
clusion that the only choice of metric that satisfies the two conditions men- 
tioned at the beginning of that section is the metric induced by pullback 
from M.(X). To recap: the metric and its associated underlying measure 
should not introduce information about F. Such information should be con- 
tained in one of two sources: the correspondence between points of F and 
points of M{X), and the prior measure. The first leads to the idea that the 
metric on diffeomorphically related copies of F should be related by pull- 
back, while the second eliminates the possibility of choosing a metric on 
one fixed copy of F and then pulling it back to the other copies, since this 
implies that we must be able to assign a value of the metric to particular 
points in F a priori, which in turn implies that we must know something 
about the identity of these points beyond the information contained in the 
prior. Hence the result given. 

Note that this argument is somewhat different from that normally used for 
Jeffreys' prior, or rather is a clarification and a refinement of that argument, 
which essentially boils down to proving that this prior is invariant under 
"reparameterizations." First, the emphasis is on the metric as providing 
F with geometry, and not on the measure, which is a derived quantity. 
Second, coordinate invariance is not an issue: the abstract way in which 
the geometry is described does not rely on a particular choice of coordinate 
system. Equation (3.1), for example, is coordinate-invariant for any choice 
of metric. Instead the emphasis is on diffeomorphism invariance: our results 
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should not depend on which copy of T we use, since this merely "shuffles" 
the points of T without changing their correspondence with points of A4{X). 

The use of the underlying measure of the pulled back metric does not 
commit us to using Jeffreys' prior as a noninformative prior. Thus the large 
amount of previous work [Bernardo (1979) and Kass and Wasserman (1996)] 
on the choice of such priors, fascinating though it is, is not directly relevant 
to our discussion here. Note in particular that the problems associated with 
Jeffreys' prior do not appear when we are talking about an underlying mea- 
sure. Normalization is not necessary since the underlying measure is not a 
probability measure. Second, the procedure advocated here suggests that 
we should first eliminate nuisance parameters using whatever prior infor- 
mation we possess, to obtain a likelihood on the parameter of interest, and 
only then derive the metric by pullback. Thus the various "paradoxes" as- 
sociated with the noncommutativity of the derivation of Jeffreys' prior and 
marginalization do not arise. 

Our argument for the metric and underlying measure on T does not de- 
pend on group-theoretic considerations. Nevertheless, the metric is compat- 
ible with these considerations, as is Jeffreys' prior, because of the following 
simple argument. Let X be a manifold with metric h, and let Y be embed- 
ded in X by /. Suppose we have two group actions Px '■ G x f(Y) — > f(Y) 
and Py - G xY ^Y. Note that the group action on X need only be defined 
for the image of Y; it may, for example, be induced by the group action on 
Y itself. If we have 

Y f{Y) 

A 



Y f{Y) 

f 

then, if G acts by isometries on X, endowing Y with the metric /*h ensures 
that / is an isometry also. Therefore, G must act by isometries on Y . If Y 
is G itself, this ensures that the underlying measure induced by the metric 
/*h is a Haar measure. 

Finally, an information-theoretic intuition is interesting. In computing the 
MAP estimate, it is equivalent to maximize the logarithm of (3.2). Natu- 
rally the logarithm consists of the difference of two terms: the logarithm of 
the posterior density and the logarithm of the underlying density. The role 
of the underlying density is the following. The information that we possess 
should presumably be that amount of information that we possess beyond 
"ignorance." If our expression for "ignorance" does not possess the value 
"zero" (i.e., the identity) in the algebra in which we add and subtract in- 
formation, then the information that we possess beyond "ignorance" should 
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be the difference between the algebraic element representing our knowledge, 
and the algebraic element representing "ignorance." In view of the "nonin- 
formative" nature of the underlying measure that we are using, the MAP 
estimate can thus consistently be thought of as finding that point in T with 
maximum information. 

This intuition, and the invariant nature of the underlying measure, suggest 
that this measure should be the reference measure for the maximum entropy 
approach to generating prior measures on manifolds. This is a subject for 
further research. 

APPENDIX: FORMS 

We provide a short introduction to the language of forms. A good ref- 
erence is the book by Choquet-Bruhat, DeWitt-Morette and Dillard-Bleick 
(1977). Briefly, differential forms are antisymmetric, multilinear functionals 
on products of vector spaces. For manifolds they are defined pointwise on the 
tangent space at each point and then required to satisfy smoothness prop- 
erties. They also allow a beautiful theory of integration on manifolds, and 
in this capacity they are thought of as co- chains, linear functionals on the 
vector space of chains in a manifold. Their advantages are great concision 
and uniformity of notation; independence of basis or coordinates; manifest 
invariance to diffeomorphisms and other transformations; and generality. In 
bringing together integration and geometry in one notation, they are ideal 
for our discussion. 

We are given a manifold T. From here, we can define the tangent space 
at each point, T^T, using a number of approaches. The result is intuitively 
clear, however, so we will not go into detail. We can bring all the tangent 
spaces together in the tangent bundle, TT. This is another manifold, each 
point of which can be thought of as a pair: a point 7 in F and a vector in 
r^F. There is a canonical projection from TT to F supplied by forgetting 
the tangent vector. At each point 7, the tangent space TyF has a dual space, 
T*T, the space of linear maps from T^F to M. These can be combined to 
form the co-tangent bundle, T*F. A vector field is a section of the tangent 
bundle: a map from F to TF whose left inverse is the canonical projection. 

We can also form product bundles, in which the "extra space" at each 
point 7 is the product of copies of the tangent space; thus each point in 
T^T can be thought of as a point 7 and an element of 0^T^F. Now at 
each point we can define higher dual spaces: T*PT = 0'^T^F is the space 
of multilinear functions on X^TyF. In particular, we can restrict attention 
to the antisymmetric linear functions: those that change sign under the 
interchange of any two arguments. These are antisymmetric tensor products 
of the co-tangent space, denoted f\^T*T. Their combination into a bundle 
is denoted /\^T*T. A section of /\T*F defines, for each point 7, an element 
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of /\^T*T. Sections of /\^T*T are known as forms, and p is the degree of 
the form. We denote the space of p- forms A^F. Forms of degree p and q 
can be multiphed to give forms of degree p + q. Because the product of 
co-tangent spaces is antisymmetric, ah forms of degree higher than m, the 
dimensionahty of the manifold, are zero. 0-forms are functions on F. 

In order to express vectors and forms more easily, it is convenient to 
introduce bases for the various spaces. This is easily done using a coordinate 
system 6\T ^ M™. A basis for TyF is then the set of -^(l)- The dual basis 
for T*T is then the set of d6^{'y), which acts on the basis of T^F as 



J' 



Taking the collection of these bases all over F, we have bases for the spaces 
of vector fields and 1-forms. Now we can form bases for the various power 
bundles. For example, a basis for the space of 2-forms is given by the set 
d9^{'~f) A d9^{'y), where A denotes the antisymmetric product. We will denote 
the basis element d6^{'y) A • • • Ad9"^{'y) of the space of m-forms (there is only 
one — if the indices are not different, antisymmetry of the product means the 
result is zero) by d'^9{^). The sign of this basis element (or in other words, 
the order of the factors of dO^ that it contains) defines an orientation on 
the manifold, in the sense that a basis for the tangent spaces, when acted 
upon by the form, will give either a positive or negative result depending on 
its orientation in the traditional sense of right- and left-handed coordinate 
systems. Given an orientation in this sense, a basis for the tangent spaces 
is either oriented or not. Not all manifolds admit a global orientation. We 
consider only orientable manifolds. 

Given another manifold Y, and a map A:y — > F, we define the tangent 
map or derivative map at a point y gY, A^^: TyY T\{y)^ as follows. A point 
{y,u) £ TY is taken to (A(y), A^u) G TF, where, in terms of coordinates 0* 
on F and on y, in which u = "u^g^, we have 

■ d dA' d 

A*n= (A*m)'— ■ = it° 



where A* = 0*(A). We also introduce the convention that repeated indices, 
one up, one down, are summed over. 

Using this map, we can define the puHback A* A of a form A G A^F (or in 
fact of any member of a power of a co-tangent space, whether antisymmetric 
or not) as 

A*Ay{u, v,...)= Aa(j^)(A*u, A*-i;, . . . ). 

Thus the action of a pulled back form on tangent vectors is defined by 
the action of the original form on the tangent vectors pushed forward by the 
tangent map. 
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As well as antisymmetric products of co-tangent spaces, we can form 
symmetric products. If at each point 7 we form the space of symmetric, 
bilinear functions on T^F x T^T, which we will denote T*T V T*T, we can 
again form a product bundle r*r V r*r. A metric h on T is a positive 
(semi-) definite section of this bundle: to each point 7 it assigns a positive 
(semi-) definite element of T*T y T*T, or in other words, an inner product 

on r^r. 

In a particular coordinate basis ^(7), the metric has components given 

by 

d , . d 



^7.. = h. (5^(7), ^(7)) 



The matrix elements of the metric at each point 7 possess a determinant, 
which we will write |h|0(^(7)). 

Using the metric h, we can define a canonical isomorphism, the Hodge 
star between A^T and A'^^PP. We show here its action for p = and 
p = m only, since that is all we will need. We choose coordinates 9^ (nothing 
will depend on this choice). Let / be a 0-form, and let A = Ad^O be an 
m-form {A is a function — the component of A in the basis dJ^O). Then we 
have 

^ ^ *hA = |hrV2^, 

where we have suppressed arguments and reference to the coordinate system 
in the definition of the determinant for clarity. 

The Hodge star can be used to define an inner product on each A^T. Since 
T^rh A is an (m — j>)-form if A is a p-form, the quantity A *h B for two p- forms 
is an ?7T,-form, and can be integrated on T: 

(A.2) ((A,B))=^A*hB. 

We can define positive m-forms as those whose action on oriented bases 
produces a positive result. It is equivalent to say that their dual under the 
action of the Hodge star is a positive function. A probability m-form is a 
positive m-form whose integral over T is equal to 1. We can divide m-forms 
by positive m-forms. For an m-form A and a positive m-form B, the value of 
^ is that unique function / such that A = /B. This division is the analogue 
of the Radon-Nikodym derivative for forms. 

On an m-dimensional manifold, m-forms can be integrated in the way 
that the notation suggests. For an m-form A = Ad^O, we have that 



/ A= f A{e)d" 
Jncr Je(n) 
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where we have used the same symbol A for the function and its expression 
in terms of coordinates. 

To integrate a p-form A over a p-dimensional submanifold embedded in 

r, y ^ r, one first pulls the form back to the embedded manifold and then 
integrates: 



These definitions highlight the second way of interpreting forms: as co- 
chains. A p-chain in F is (roughly speaking) a linear combination of p- 
dimensional rectangles embedded in the manifold. The space of linear func- 
tions on the space of p-chains (the co-chains) can be identified with A^F. 

We will have cause to integrate a function / over a p-dimensional sub- 
manifold Y of F. This is slightly different from the case of integrating 
a p-form. One first pulls the function back to Y and then uses a metric on 
Y to convert the function into a p-form that can be integrated over Y: 



where by definition (A*/)(?/) = /(A(y)), and h is a metric on Y. 

However, since we are interested in the submanifold in F and not Y itself, 
we are really considering an equivalence class of embeddings {/e}, where 
e : y — > y is a diffeomorphism, with the same image. The result of our inte- 
gration should be independent of the representative in this equivalence class, 
and this means that the metric on Y must vary with the representative. If 
no representative is distinguished, the only way to achieve this invariance is 
to pull back a metric g on F to y, and use this metric to define the Hodge 
star: 
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